A Single Disk Failure Recovery for X-code Based Parallel Storage Systems
نویسنده
چکیده
Achieving data availability and reliability guarantees against disk failure using redundancy coding scheme. X-code is a double tolerant to achieve the optimal update complexity. It reduces the possibility of data unavailability when disk/node fails. X-code based optimal recovery scheme Minimum-disk-read-recovery (MDRR) and Group-based MDRR (GMDRR) minimizes the number of disk reads for a single-disk failure recovery. A tight lower bound of disk read is formed and MDRR algorithm is applied to match the theoretical lower bound. A disk read cannot be balanced while matching the lower bound of disk reads within a single stripe and cannot be balanced among different disk by simply rotating disk. A leap rotation scheme which balances disk read among different disk within a group of stripes and it matches the lower bound of disk read is called as GMDRR. MDRR reduces around 25% percent of recovery time of the conventional approach.
منابع مشابه
A Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملS-Code: Lowest Density MDS Array Codes for RAID-6
RAID, a storage architecture designed to exploit I/O parallelism and provide data reliability, has been deployed widely in computing systems as a storage building block. In large scale storage systems, in particular, RAID-6 is gradually replacing RAID-5 as the dominant form of disk arrays due to its capability of tolerating concurrent failures of any two disks. MDS (maximum distance separable) ...
متن کاملFailure Recovery Issues in Large Scale, Heavily Utilized Disk Storage Systems
Large data is increasingly important to large-scale computation and data analysis. Storage systems with petabytes of disk capacity are not uncommon in high-performance computing and internet services today and are expected to grow at 40-100% per year. These sizes and rates of growth render traditional, single-failure-tolerant (RAID 5) hardware controllers increasingly inappropriate. Stronger pr...
متن کاملA Robust Storage System Architecture
Error-correcting codes allow either incorrect data to be corrected or missing data to be rebuilt. They are frequently used with communications channels to recover data lost through line noise and thus provide a ‘noise free’ bit pipe. Data can also be lost through hardware failure; for instance a disk crash. In case of hardware failure, we want a storage system that has the robustness and tunabi...
متن کاملOptimal Repair of MDS Codes in Distributed Storage via Subspace Interference Alignment
It is well known that an (n, k) code can be used to store information in a distributed storage system with n nodes/disks. If the storage capacity of each node/disk is normalized to one unit, the code can be used to store k units of information, where n > k. If the code used is maximum distance separable (MDS), then the storage system can tolerate up to (n−k) disk failures (erasures), since the ...
متن کامل